Homework 6


Important:

All answers need to be round to 3 decimals, except the last problem, which needs the accurate answer.


Question 1
?/? point (graded)

In this question, we will train a Naive Bayes classifier to predict class labels Y as a function of input features .

We are given the following 15 training points:

What is the maximum likelihood estimate of the prior P(Y)?

Y P(Y)
A [q1.1]
B [q1.2]
C [q1.3] 

What are the maximum likelihood estimates of the conditional probability distributions? Fill in the tables below (the second and third are done for you).

Y
0 A [q1.4]
1 A [q1.5]
0 B [q1.6]
1 B [q1.7]
0 C [q1.8]
1 C [q1.9]
Y
0 A 1.000
1 A 0.000
0 B 0.222
1 B 0.778
0 C 0.250
1 C 0.750
Y
0 A 0.500
1 A 0.500
0 B 0.000
1 B 1.000
0 C 0.500
1 C 0.500

q1.1 =

q1.2 =

q1.3 =

q1.4 =

q1.5 =

q1.6 =

q1.7 =

q1.8 =

q1.9 =


Question 2
?/? point (graded)

Following question 1, Now consider a new data point . Use your classifier to determine the joint probability of causes Y and this new data point, along with the posterior probability of Y given the new data:

Y
A [q2.1]
B [q2.2]
C [q2.3]
Y
A [q2.4]
B [q2.5]
C [q2.6]

What label does your classifier give to the new data point? (Break ties alphabetically). Enter capital letters only

[q2.7]

q2.1 =

q2.2 =

q2.3 =

q2.4 =

q2.5 =

q2.6 =

q2.7 =


Question 3
?/? point (graded)

The training data is repeated here for your convenience:

Following the previous questions, now use Laplace Smoothing with strength k = 3 to estimate the prior P(Y) for the same data.

Y P(Y)
A [q3.1]
B [q3.2]
C [q3.3]

Use Laplace Smoothing with strength k = 3 to estimate the conditional probability distributions below (again, the second two are done for you).

Y
0 A [q3.4]
1 A [q3.5]
0 B [q3.6]
1 B [q3.7]
0 C [q3.8]
1 C [q3.9]
Y
0 A 0.625
1 A 0.375
0 B 0.333
1 B 0.667
0 C 0.400
1 C 0.600
Y
0 A 0.500
1 A 0.500
0 B 0.200
1 B 0.800
0 C 0.500
1 C 0.500

q3.1 =

q3.2 =

q3.3 =

q3.4 =

q3.5 =

q3.6 =

q3.7 =

q3.8 =

q3.9 =


Question 4
?/? point (graded)

Now consider again the new data point . Use the Laplace-Smoothed version of your classifier to determine the joint probability of causes Y and this new data point, along with the posterior probability of Y given the new data:

Y
A [q4.1]
B [q4.2]
C [q4.3]
Y
A [q4.4]
B [q4.5]
C [q4.6]

What label does your (Laplace-Smoothed) classifier give to the new data point? (Break ties alphabetically). Enter a single capital letter.

[q4.7]

q4.1 =

q4.2 =

q4.3 =

q4.4 =

q4.5 =

q4.6 =

q4.7 =


Question 5
?/? point (graded)

When training a classifier, it is common to split the available data into a training set, a hold-out set, and a test set, each of which has a different role.

Which data set is used to learn the conditional probabilities?


Question 6
?/? point (graded)

Which data set is used to tune the Laplace Smoothing hyperparameters?


Question 7
?/? point (graded)

Which data set is used for quantifying performance results?


Question 8
?/? point (graded)

Consider a context-free grammar with the following rules (assume that S is the start symbol):

S → NP VP

NP → DT NN

NP → NP PP

PP → IN NP

VP → VB NP

DT → the

NN → man

NN → dog

NN → cat

NN → park

VB → saw

IN → in

IN → with

IN → under 


How many parse trees are there under this grammar for the sentence: the man saw the dog in the park? 


Question 9
?/? point (graded)

Following the previous question,  How many parse trees for the sentence:  the man saw the dog in the park with the cat?


Question 10
?/? point (graded)

The K-means algorithm:


Question 11
?/? point (graded)

Consider the following PCFG (probabilities for each rule are shown after the rule):

S → NP VP 1.0

PP → P NP 1.0

VP → V NP 0.6

VP → VP PP 0.4

P → with 0.8

P → in 0.2

V → saw 0.7

V → look 0.3

NP → NP PP 0.3

NP → Astronomers 0.12

NP → ears 0.18

NP → saw 0.02

NP → stars 0.18

NP → telescopes 0.2


What is the probability of the best parse tree for the sentence: Astronomers saw stars with ears?